AITopics | convergence speed

NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective

Neural Information Processing SystemsJun-23-2026, 07:04:09 GMT

Multi-Task Learning (MTL) enables a single model to learn multiple tasks simultaneously, leveraging knowledge transfer among tasks for enhanced generalization, and has been widely applied across various domains. However, task imbalance remains a major challenge in MTL. Although balancing the convergence speeds of different tasks is an effective approach to address this issue, it is highly challenging to accurately characterize the training dynamics and convergence speeds of multiple tasks within the complex MTL system. To this end, we attempt to analyze the training dynamics in MTL by leveraging Neural Tangent Kernel (NTK) theory and propose a new MTL method, NTKMTL. Specifically, we introduce an extended NTK matrix for MTL and adopt spectral analysis to balance the convergence speeds of multiple tasks, thereby mitigating task imbalance. Based on the approximation via shared representation, we further propose NTKMTL-SR, achieving training efficiency while maintaining competitive performance. Extensive experiments demonstrate that our methods achieve state-of-the-art performance across a wide range of benchmarks, including both multi-task supervised learning and multi-task reinforcement learning.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sinusoidal Initialization, Time for a New Start

Neural Information Processing SystemsJun-17-2026, 19:29:54 GMT

Initialization plays a critical role in Deep Neural Network training, directly influencing convergence, stability, and generalization. Common approaches such as Glorot and He initializations rely on randomness, which can produce uneven weight distributions across layer connections. In this paper, we introduce the Sinusoidal initialization, a novel deterministic method that employs sinusoidal functions to construct structured weight matrices expressly to improve the spread and balance of weights throughout the network while simultaneously fostering a more uniform, well-conditioned distribution of neuron activation states from the very first forward pass. Because Sinusoidal initialization begins with weights and activations that are already evenly and efficiently utilized, it delivers consistently faster convergence, greater training stability, and higher final accuracy across a wide range of models, including convolutional neural networks, vision transformers, and large language models. On average, our experiments show an increase of 4.9% in final validation accuracy and 20.9% in convergence speed. By replacing randomness with structure, this initialization provides a stronger and more reliable foundation for Deep Learning systems.

artificial intelligence, initialization, machine learning, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective

Neural Information Processing SystemsJun-16-2026, 22:02:17 GMT

Multi-Task Learning (MTL) enables a single model to learn multiple tasks simultaneously, leveraging knowledge transfer among tasks for enhanced generalization, and has been widely applied across various domains. However, task imbalance remains a major challenge in MTL. Although balancing the convergence speeds of different tasks is an effective approach to address this issue, it is highly challenging to accurately characterize the training dynamics and convergence speeds of multiple tasks within the complex MTL system. To this end, we attempt to analyze the training dynamics in MTL by leveraging Neural Tangent Kernel (NTK) theory and propose a new MTL method, NTKMTL. Specifically, we introduce an extended NTK matrix for MTL and adopt spectral analysis to balance the convergence speeds of multiple tasks, thereby mitigating task imbalance. Based on the approximation via shared representation, we further propose NTKMTL-SR, achieving training efficiency while maintaining competitive performance. Extensive experiments demonstrate that our methods achieve state-of-the-art performance across a wide range of benchmarks, including both multi-task supervised learning and multi-task reinforcement learning.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning

Neural Information Processing SystemsJun-12-2026, 23:17:31 GMT

Federated learning (FL) enables collaborative model training across multiple parties without sharing raw data, with semi-asynchronous FL (SAFL) emerging as a balanced approach between synchronous and asynchronous FL. However, SAFL faces significant challenges in optimizing both gradient-based (e.g., FedSGD) and model-based (e.g., FedAvg) aggregation strategies, which exhibit distinct trade-offs in accuracy, convergence speed, and stability. While gradient aggregation achieves faster convergence and higher accuracy, it suffers from pronounced fluctuations, whereas model aggregation offers greater stability but slower convergence and suboptimal accuracy. This paper presents FedQS, the first framework to theoretically analyze and address these disparities in SAFL. FedQS introduces a to handle client heterogeneity by classifying clients into four distinct types and adaptively optimizing their local training based on data distribution characteristics and available computational resources. Extensive experiments on computer vision, natural language processing, and real-world tasks demonstrate that FedQS achieves the highest accuracy, attains the lowest loss, and ranks among the fastest in convergence speed, outperforming state-of-the-art baselines.

artificial intelligence, natural language, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.97)

Add feedback

Stochastic Structured Prediction under Bandit Feedback

Artem Sokolov, Julia Kreutzer, Stefan Riezler, Christopher Lo

Neural Information Processing SystemsMar-23-2026, 13:06:07 GMT

Neural Information Processing Systems http://nips.cc/

information retrieval, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
(2 more...)

Add feedback

Carlo Settings Through Counterfactual Value Based Fictitious Play Ju Qi

Neural Information Processing SystemsFeb-18-2026, 00:02:28 GMT

The Counterfactual Regret Minimization (CFR) algorithm Zinkevich et al. [2007] is crucial for solving extensive-form games with incomplete information.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.05)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

edd0d433f8a1a51aa11237a6543fc280-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 20:02:02 GMT

artificial intelligence, dac-detr, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
Europe > Switzerland > Basel-City > Basel (0.05)

Technology:

Information Technology > Artificial Intelligence > Vision (0.74)
Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

986292a930c3692168b177a770025ab3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 20:53:28 GMT

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Dominican Republic (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Energy (0.46)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Birder: Communication-Efficient 1-bit Adaptive Optimizer for Practical Distributed DNN Training

Neural Information Processing SystemsFeb-15-2026, 10:41:34 GMT

Therefore, from a system-level perspective, the design ethos of a system-efficient communication-compression algorithm is that we should guarantee that the compression/decompression of the algorithm is computationally light and takes less time, and it should also be friendly to efficient collective communication primitives.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.67)

Technology: